NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design

https://doi.org/10.1109/MICRO61859.2024.00107

Nayak, Nandeeka; Wu, Xinrui; Odemuyiwa, Toluwanimi O; Pellauer, Michael; Emer, Joel S; Fletcher, Christopher W (November 2024, IEEE)

Full Text Available
From TeAAL to FuseMax: Separation of Concerns for Attention Accelerator Design

https://doi.org/10.1109/MM.2025.3589955

Nayak, Nandeeka; Odemuyiwa, Toluwanimi O; Wu, Xinrui; Pellauer, Michael; Emer, Joel S; Fletcher, Christopher W (July 2025, IEEE Micro)

Free, publicly-accessible full text available July 1, 2026
TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

https://doi.org/10.1145/3613424.3623791

Nayak, Nandeeka; Odemuyiwa, Toluwanimi O; Ugare, Shubham; Fletcher, Christopher; Pellauer, Michael; Emer, Joel (October 2023, ACM)

Full Text Available
DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators

https://doi.org/10.23919/DATE54114.2022.9774568

Kao, Sheng-Chun; Pellauer, Michael; Parashar, Angshuman; Krishna, Tushar (March 2022, 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators

https://doi.org/10.1145/3485137

Chatarasi, Prasanth; Kwon, Hyoukjun; Parashar, Angshuman; Pellauer, Michael; Krishna, Tushar; Sarkar, Vivek (March 2022, ACM Transactions on Architecture and Code Optimization)

A spatial accelerator’s efficiency depends heavily on both its mapper and cost models to generate optimized mappings for various operators of DNN models. However, existing cost models lack a formal boundary over their input programs (operators) for accurate and tractable cost analysis of the mappings, and this results in adaptability challenges to the cost models for new operators. We consider the recently introduced Maestro Data-Centric (MDC) notation and its analytical cost model to address this challenge because any mapping expressed in the notation is precisely analyzable using the MDC’s cost model. In this article, we characterize the set of input operators and their mappings expressed in the MDC notation by introducing a set of conformability rules . The outcome of these rules is that any loop nest that is perfectly nested with affine tensor subscripts and without conditionals is conformable to the MDC notation. A majority of the primitive operators in deep learning are such loop nests. In addition, our rules enable us to automatically translate a mapping expressed in the loop nest form to MDC notation and use the MDC’s cost model to guide upstream mappers. Our conformability rules over the input operators result in a structured mapping space of the operators, which enables us to introduce a mapper based on our decoupled off-chip/on-chip approach to accelerate mapping space exploration. Our mapper decomposes the original higher-dimensional mapping space of operators into two lower-dimensional off-chip and on-chip subspaces and then optimizes the off-chip subspace followed by the on-chip subspace. We implemented our overall approach in a tool called Marvel , and a benefit of our approach is that it applies to any operator conformable with the MDC notation. We evaluated Marvel over major DNN operators and compared it with past optimizers.
more » « less
Full Text Available
Flexion: A Quantitative Metric for Flexibility in DNN Accelerators

https://doi.org/10.1109/LCA.2020.3044607

Kwon, Hyoukjun; Pellauer, Michael; Parashar, Angshuman; Krishna, Tushar (January 2021, IEEE Computer Architecture Letters)

Full Text Available
Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

https://doi.org/10.1109/HPCA51647.2021.00016

Kwon, Hyoukjun; Lai, Liangzhen; Pellauer, Michael; Krishna, Tushar; Chen, Yu-Hsin; Chandra, Vikas (February 2021, 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available
A Formalism of DNN Accelerator Flexibility

https://doi.org/10.1145/3530907

Kao, Sheng-Chun; Kwon, Hyoukjun; Pellauer, Michael; Parashar, Angshuman; Krishna, Tushar (June 2022, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

The high efficiency of domain-specific hardware accelerators for machine learning (ML) has come fromspecialization, with the trade-off of less configurability/ flexibility. There is growing interest in developingflexible ML accelerators to make them future-proof to the rapid evolution of Deep Neural Networks (DNNs). However, the notion of accelerator flexibility has always been used in an informal manner, restricting computer architects from conducting systematic apples-to-apples design-space exploration (DSE) across trillions of choices. In this work, we formally define accelerator flexibility and show how it can be integrated for DSE. % flows. Specifically, we capture DNN accelerator flexibility across four axes: %the map-space of DNN accelerator along four flexibility axes: tiling, ordering, parallelization, and array shape. We categorize existing accelerators into 16 classes based on their axes of flexibility support, and define a precise quantification of the degree of flexibility of an accelerator across each axis. We leverage these to develop a novel flexibility-aware DSE framework. %It respects the difference of accelerator flexibility classes and degree of flexibility support in different accelerators, creating unique map-spaces. %and forms a unique map space for exploration. % We demonstrate how this can be used to perform first-of-their-kind evaluations, including an isolation study to identify the individual impact of the flexibility axes. We demonstrate that adding flexibility features to a hypothetical DNN accelerator designed in 2014 improves runtime on future (i.e., present-day) DNNs by 11.8x geomean.
more » « less
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings

Kwon, Hyoukjun; Chatarasi, Prasanth; Pellauer, Michael; Parashar, Angshuman; Sarkar, Vivek; Krishna, Tushar (July 2020, IEEE micro)

The efficiency of an accelerator depends on three factors—mapping, deep neural network (DNN) layers, and hardware—constructing extremely complicated design space of DNN accelerators. To demystify such complicated design space and guide the DNN accelerator design for better efficiency, we propose an analytical cost model, MAESTRO. MAESTRO receives DNN model description and hardware resources information as a list, and mapping described in a data-centric representation we propose as inputs. The data centric representation consists of three directives that enable concise description of mappings in a compiler-friendly form. MAESTRO analyzes various forms of data reuse in an accelerator based on inputs quickly and generates more than 20 statistics including total latency, energy, throughput, etc., as outputs. MAESTRO’s fast analysis enables various optimization tools for DNN accelerators such as hardware design exploration tool we present as an example.
more » « less
Full Text Available
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings

https://doi.org/10.1109/MM.2020.2985963

Kwon, Hyoukjun; Chatarasi, Prasanth; Sarkar, Vivek; Krishna, Tushar; Pellauer, Michael; Parashar, Angshuman (May 2020, IEEE Micro)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records